Overview

Dataset statistics

Number of variables16
Number of observations4237
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory529.8 KiB
Average record size in memory128.0 B

Variable types

Categorical8
Numeric8

Warnings

currentSmoker is highly correlated with cigsPerDayHigh correlation
cigsPerDay is highly correlated with currentSmokerHigh correlation
prevalentHyp is highly correlated with sysBP and 1 other fieldsHigh correlation
diabetes is highly correlated with glucoseHigh correlation
sysBP is highly correlated with prevalentHyp and 1 other fieldsHigh correlation
diaBP is highly correlated with prevalentHyp and 1 other fieldsHigh correlation
glucose is highly correlated with diabetesHigh correlation
currentSmoker is highly correlated with cigsPerDayHigh correlation
cigsPerDay is highly correlated with currentSmokerHigh correlation
prevalentHyp is highly correlated with sysBP and 1 other fieldsHigh correlation
sysBP is highly correlated with prevalentHyp and 1 other fieldsHigh correlation
diaBP is highly correlated with prevalentHyp and 1 other fieldsHigh correlation
currentSmoker is highly correlated with cigsPerDayHigh correlation
cigsPerDay is highly correlated with currentSmokerHigh correlation
prevalentHyp is highly correlated with sysBP and 1 other fieldsHigh correlation
sysBP is highly correlated with prevalentHyp and 1 other fieldsHigh correlation
diaBP is highly correlated with prevalentHyp and 1 other fieldsHigh correlation
glucose is highly correlated with diabetesHigh correlation
currentSmoker is highly correlated with cigsPerDayHigh correlation
diabetes is highly correlated with glucoseHigh correlation
prevalentHyp is highly correlated with diaBP and 1 other fieldsHigh correlation
diaBP is highly correlated with prevalentHyp and 1 other fieldsHigh correlation
sysBP is highly correlated with prevalentHyp and 1 other fieldsHigh correlation
cigsPerDay is highly correlated with currentSmokerHigh correlation
cigsPerDay has 2144 (50.6%) zeros Zeros

Reproduction

Analysis started2021-07-19 18:43:50.886436
Analysis finished2021-07-19 18:44:03.278289
Duration12.39 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

Sex
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size33.2 KiB
0
2419 
1
1818 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4237
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row0
3rd row1
4th row0
5th row0

Common Values

ValueCountFrequency (%)
02419
57.1%
11818
42.9%

Length

2021-07-19T20:44:03.475458image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-19T20:44:03.536308image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
02419
57.1%
11818
42.9%

Most occurring characters

ValueCountFrequency (%)
02419
57.1%
11818
42.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4237
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
02419
57.1%
11818
42.9%

Most occurring scripts

ValueCountFrequency (%)
Common4237
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
02419
57.1%
11818
42.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII4237
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
02419
57.1%
11818
42.9%

age
Real number (ℝ≥0)

Distinct39
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean49.58154354
Minimum32
Maximum70
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2021-07-19T20:44:03.620422image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum32
5-th percentile37
Q142
median49
Q356
95-th percentile64
Maximum70
Range38
Interquartile range (IQR)14

Descriptive statistics

Standard deviation8.570309619
Coefficient of variation (CV)0.1728528199
Kurtosis-0.988951626
Mean49.58154354
Median Absolute Deviation (MAD)7
Skewness0.2284151287
Sum210077
Variance73.45020697
MonotonicityNot monotonic
2021-07-19T20:44:03.752171image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=39)
ValueCountFrequency (%)
40191
 
4.5%
46182
 
4.3%
42180
 
4.2%
41174
 
4.1%
48173
 
4.1%
39169
 
4.0%
44166
 
3.9%
45162
 
3.8%
43159
 
3.8%
52149
 
3.5%
Other values (29)2532
59.8%
ValueCountFrequency (%)
321
 
< 0.1%
335
 
0.1%
3418
 
0.4%
3542
 
1.0%
3684
2.0%
3792
2.2%
38144
3.4%
39169
4.0%
40191
4.5%
41174
4.1%
ValueCountFrequency (%)
702
 
< 0.1%
697
 
0.2%
6818
 
0.4%
6745
1.1%
6638
 
0.9%
6557
1.3%
6492
2.2%
63110
2.6%
6299
2.3%
61110
2.6%

education
Categorical

Distinct4
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size33.2 KiB
1.0
1824 
2.0
1253 
3.0
687 
4.0
473 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters12711
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row4.0
2nd row2.0
3rd row1.0
4th row3.0
5th row3.0

Common Values

ValueCountFrequency (%)
1.01824
43.0%
2.01253
29.6%
3.0687
 
16.2%
4.0473
 
11.2%

Length

2021-07-19T20:44:03.963019image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-19T20:44:04.022998image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
1.01824
43.0%
2.01253
29.6%
3.0687
 
16.2%
4.0473
 
11.2%

Most occurring characters

ValueCountFrequency (%)
.4237
33.3%
04237
33.3%
11824
14.3%
21253
 
9.9%
3687
 
5.4%
4473
 
3.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number8474
66.7%
Other Punctuation4237
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
04237
50.0%
11824
21.5%
21253
 
14.8%
3687
 
8.1%
4473
 
5.6%
Other Punctuation
ValueCountFrequency (%)
.4237
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common12711
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
.4237
33.3%
04237
33.3%
11824
14.3%
21253
 
9.9%
3687
 
5.4%
4473
 
3.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII12711
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
.4237
33.3%
04237
33.3%
11824
14.3%
21253
 
9.9%
3687
 
5.4%
4473
 
3.7%

currentSmoker
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size33.2 KiB
0
2144 
1
2093 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4237
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
02144
50.6%
12093
49.4%

Length

2021-07-19T20:44:04.197383image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-19T20:44:04.258140image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
02144
50.6%
12093
49.4%

Most occurring characters

ValueCountFrequency (%)
02144
50.6%
12093
49.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4237
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
02144
50.6%
12093
49.4%

Most occurring scripts

ValueCountFrequency (%)
Common4237
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
02144
50.6%
12093
49.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII4237
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
02144
50.6%
12093
49.4%

cigsPerDay
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct34
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.064943959
Minimum0
Maximum70
Zeros2144
Zeros (%)50.6%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2021-07-19T20:44:04.334565image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q320
95-th percentile30
Maximum70
Range70
Interquartile range (IQR)20

Descriptive statistics

Standard deviation11.90481717
Coefficient of variation (CV)1.313280835
Kurtosis0.9940249376
Mean9.064943959
Median Absolute Deviation (MAD)0
Skewness1.232054394
Sum38408.16755
Variance141.7246719
MonotonicityNot monotonic
2021-07-19T20:44:04.448484image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=34)
ValueCountFrequency (%)
02144
50.6%
20734
 
17.3%
30217
 
5.1%
15210
 
5.0%
10143
 
3.4%
9130
 
3.1%
5121
 
2.9%
3100
 
2.4%
4080
 
1.9%
167
 
1.6%
Other values (24)291
 
6.9%
ValueCountFrequency (%)
02144
50.6%
167
 
1.6%
218
 
0.4%
3100
 
2.4%
49
 
0.2%
5121
 
2.9%
618
 
0.4%
712
 
0.3%
811
 
0.3%
9130
 
3.1%
ValueCountFrequency (%)
701
 
< 0.1%
6011
 
0.3%
506
 
0.1%
453
 
0.1%
4356
 
1.3%
4080
 
1.9%
381
 
< 0.1%
3522
 
0.5%
30217
5.1%
291
 
< 0.1%

BPMeds
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size33.2 KiB
0.0
4113 
1.0
 
124

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters12711
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.04113
97.1%
1.0124
 
2.9%

Length

2021-07-19T20:44:04.657357image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-19T20:44:04.716165image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
0.04113
97.1%
1.0124
 
2.9%

Most occurring characters

ValueCountFrequency (%)
08350
65.7%
.4237
33.3%
1124
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number8474
66.7%
Other Punctuation4237
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
08350
98.5%
1124
 
1.5%
Other Punctuation
ValueCountFrequency (%)
.4237
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common12711
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
08350
65.7%
.4237
33.3%
1124
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII12711
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
08350
65.7%
.4237
33.3%
1124
 
1.0%

prevalentStroke
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size33.2 KiB
0
4212 
1
 
25

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4237
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
04212
99.4%
125
 
0.6%

Length

2021-07-19T20:44:04.879419image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-19T20:44:04.940744image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
04212
99.4%
125
 
0.6%

Most occurring characters

ValueCountFrequency (%)
04212
99.4%
125
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4237
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
04212
99.4%
125
 
0.6%

Most occurring scripts

ValueCountFrequency (%)
Common4237
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
04212
99.4%
125
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII4237
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
04212
99.4%
125
 
0.6%

prevalentHyp
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size33.2 KiB
0
2922 
1
1315 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4237
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row1
5th row0

Common Values

ValueCountFrequency (%)
02922
69.0%
11315
31.0%

Length

2021-07-19T20:44:05.318726image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-19T20:44:05.380155image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
02922
69.0%
11315
31.0%

Most occurring characters

ValueCountFrequency (%)
02922
69.0%
11315
31.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4237
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
02922
69.0%
11315
31.0%

Most occurring scripts

ValueCountFrequency (%)
Common4237
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
02922
69.0%
11315
31.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII4237
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
02922
69.0%
11315
31.0%

diabetes
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size33.2 KiB
0
4128 
1
 
109

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4237
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
04128
97.4%
1109
 
2.6%

Length

2021-07-19T20:44:05.557753image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-19T20:44:05.614683image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
04128
97.4%
1109
 
2.6%

Most occurring characters

ValueCountFrequency (%)
04128
97.4%
1109
 
2.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4237
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
04128
97.4%
1109
 
2.6%

Most occurring scripts

ValueCountFrequency (%)
Common4237
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
04128
97.4%
1109
 
2.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII4237
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
04128
97.4%
1109
 
2.6%

totChol
Real number (ℝ≥0)

Distinct249
Distinct (%)5.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean236.7257681
Minimum107
Maximum696
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2021-07-19T20:44:05.708714image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum107
5-th percentile170
Q1206
median234
Q3262
95-th percentile312
Maximum696
Range589
Interquartile range (IQR)56

Descriptive statistics

Standard deviation44.33084805
Coefficient of variation (CV)0.1872666774
Kurtosis4.215179938
Mean236.7257681
Median Absolute Deviation (MAD)28
Skewness0.8762829319
Sum1003007.079
Variance1965.224089
MonotonicityNot monotonic
2021-07-19T20:44:05.832110image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
24085
 
2.0%
22070
 
1.7%
26062
 
1.5%
21061
 
1.4%
23259
 
1.4%
25057
 
1.3%
20056
 
1.3%
23054
 
1.3%
22554
 
1.3%
20553
 
1.3%
Other values (239)3626
85.6%
ValueCountFrequency (%)
1071
< 0.1%
1131
< 0.1%
1191
< 0.1%
1241
< 0.1%
1261
< 0.1%
1291
< 0.1%
1331
< 0.1%
1352
< 0.1%
1371
< 0.1%
1402
< 0.1%
ValueCountFrequency (%)
6961
 
< 0.1%
6001
 
< 0.1%
4641
 
< 0.1%
4531
 
< 0.1%
4391
 
< 0.1%
4321
 
< 0.1%
4103
0.1%
4051
 
< 0.1%
3981
 
< 0.1%
3921
 
< 0.1%

sysBP
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct234
Distinct (%)5.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean132.3429313
Minimum83.5
Maximum295
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2021-07-19T20:44:05.958383image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum83.5
5-th percentile104
Q1117
median128
Q3144
95-th percentile175
Maximum295
Range211.5
Interquartile range (IQR)27

Descriptive statistics

Standard deviation22.03206212
Coefficient of variation (CV)0.16647706
Kurtosis2.161258192
Mean132.3429313
Median Absolute Deviation (MAD)13
Skewness1.146436007
Sum560737
Variance485.4117613
MonotonicityNot monotonic
2021-07-19T20:44:06.079836image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
120107
 
2.5%
130102
 
2.4%
11096
 
2.3%
11589
 
2.1%
12588
 
2.1%
12484
 
2.0%
12280
 
1.9%
12673
 
1.7%
12873
 
1.7%
12372
 
1.7%
Other values (224)3373
79.6%
ValueCountFrequency (%)
83.52
 
< 0.1%
851
 
< 0.1%
85.51
 
< 0.1%
902
 
< 0.1%
921
 
< 0.1%
92.52
 
< 0.1%
932
 
< 0.1%
93.52
 
< 0.1%
943
0.1%
957
0.2%
ValueCountFrequency (%)
2951
 
< 0.1%
2481
 
< 0.1%
2441
 
< 0.1%
2431
 
< 0.1%
2351
 
< 0.1%
2321
 
< 0.1%
2301
 
< 0.1%
2202
< 0.1%
2171
 
< 0.1%
2153
0.1%

diaBP
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct146
Distinct (%)3.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean82.89532688
Minimum48
Maximum142.5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2021-07-19T20:44:06.200665image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum48
5-th percentile66
Q175
median82
Q390
95-th percentile104.6
Maximum142.5
Range94.5
Interquartile range (IQR)15

Descriptive statistics

Standard deviation11.91163788
Coefficient of variation (CV)0.1436949262
Kurtosis1.276485169
Mean82.89532688
Median Absolute Deviation (MAD)7.5
Skewness0.713728353
Sum351227.5
Variance141.887117
MonotonicityNot monotonic
2021-07-19T20:44:06.327427image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
80262
 
6.2%
82152
 
3.6%
85137
 
3.2%
70135
 
3.2%
81131
 
3.1%
84122
 
2.9%
90119
 
2.8%
78116
 
2.7%
87113
 
2.7%
86107
 
2.5%
Other values (136)2843
67.1%
ValueCountFrequency (%)
481
 
< 0.1%
501
 
< 0.1%
511
 
< 0.1%
522
 
< 0.1%
531
 
< 0.1%
541
 
< 0.1%
553
0.1%
562
 
< 0.1%
576
0.1%
57.53
0.1%
ValueCountFrequency (%)
142.51
 
< 0.1%
1401
 
< 0.1%
1362
 
< 0.1%
1352
 
< 0.1%
1332
 
< 0.1%
1321
 
< 0.1%
1305
0.1%
1291
 
< 0.1%
1281
 
< 0.1%
127.51
 
< 0.1%

BMI
Real number (ℝ≥0)

Distinct1363
Distinct (%)32.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean25.80166627
Minimum15.54
Maximum56.8
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2021-07-19T20:44:06.451608image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum15.54
5-th percentile20.06
Q123.08
median25.4
Q328.03
95-th percentile32.772
Maximum56.8
Range41.26
Interquartile range (IQR)4.95

Descriptive statistics

Standard deviation4.0712539
Coefficient of variation (CV)0.157790348
Kurtosis2.682168382
Mean25.80166627
Median Absolute Deviation (MAD)2.48
Skewness0.9843050542
Sum109321.66
Variance16.57510832
MonotonicityNot monotonic
2021-07-19T20:44:06.574343image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
22.1918
 
0.4%
22.5418
 
0.4%
23.4818
 
0.4%
22.9118
 
0.4%
25.0916
 
0.4%
23.0916
 
0.4%
25.2313
 
0.3%
23.113
 
0.3%
22.7313
 
0.3%
27.7312
 
0.3%
Other values (1353)4082
96.3%
ValueCountFrequency (%)
15.541
< 0.1%
15.961
< 0.1%
16.481
< 0.1%
16.592
< 0.1%
16.611
< 0.1%
16.691
< 0.1%
16.711
< 0.1%
16.731
< 0.1%
16.751
< 0.1%
16.871
< 0.1%
ValueCountFrequency (%)
56.81
< 0.1%
51.281
< 0.1%
45.81
< 0.1%
45.791
< 0.1%
44.711
< 0.1%
44.551
< 0.1%
44.271
< 0.1%
44.091
< 0.1%
43.691
< 0.1%
43.671
< 0.1%

heartRate
Real number (ℝ≥0)

Distinct73
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean75.87892377
Minimum44
Maximum143
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2021-07-19T20:44:06.700082image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum44
5-th percentile60
Q168
median75
Q383
95-th percentile98
Maximum143
Range99
Interquartile range (IQR)15

Descriptive statistics

Standard deviation12.02659635
Coefficient of variation (CV)0.158497192
Kurtosis0.9074832435
Mean75.87892377
Median Absolute Deviation (MAD)7
Skewness0.6444817335
Sum321499
Variance144.6390198
MonotonicityNot monotonic
2021-07-19T20:44:06.845743image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
75563
 
13.3%
80385
 
9.1%
70305
 
7.2%
60231
 
5.5%
85227
 
5.4%
72222
 
5.2%
65197
 
4.6%
90172
 
4.1%
68151
 
3.6%
10098
 
2.3%
Other values (63)1686
39.8%
ValueCountFrequency (%)
441
 
< 0.1%
452
 
< 0.1%
461
 
< 0.1%
471
 
< 0.1%
485
 
0.1%
5022
0.5%
511
 
< 0.1%
5217
0.4%
5311
0.3%
5412
0.3%
ValueCountFrequency (%)
1431
 
< 0.1%
1401
 
< 0.1%
1301
 
< 0.1%
1253
 
0.1%
1222
 
< 0.1%
1207
 
0.2%
1155
 
0.1%
1123
 
0.1%
11036
0.8%
1088
 
0.2%

glucose
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct323
Distinct (%)7.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean81.81652858
Minimum40
Maximum394
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2021-07-19T20:44:06.987521image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum40
5-th percentile62
Q172
median78
Q386
95-th percentile107
Maximum394
Range354
Interquartile range (IQR)14

Descriptive statistics

Standard deviation23.0136147
Coefficient of variation (CV)0.2812831967
Kurtosis63.17104292
Mean81.81652858
Median Absolute Deviation (MAD)7
Skewness6.435320143
Sum346656.6316
Variance529.6264615
MonotonicityNot monotonic
2021-07-19T20:44:07.164985image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
75193
 
4.6%
77168
 
4.0%
73156
 
3.7%
80154
 
3.6%
70152
 
3.6%
78151
 
3.6%
83151
 
3.6%
74142
 
3.4%
76128
 
3.0%
85127
 
3.0%
Other values (313)2715
64.1%
ValueCountFrequency (%)
402
 
< 0.1%
431
 
< 0.1%
442
 
< 0.1%
454
0.1%
473
0.1%
481
 
< 0.1%
503
0.1%
522
 
< 0.1%
535
0.1%
545
0.1%
ValueCountFrequency (%)
3942
< 0.1%
3861
< 0.1%
3701
< 0.1%
3681
< 0.1%
3481
< 0.1%
3321
< 0.1%
3251
< 0.1%
3201
< 0.1%
2971
< 0.1%
2941
< 0.1%

TenYearCHD
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size33.2 KiB
0
3594 
1
643 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4237
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row1
5th row0

Common Values

ValueCountFrequency (%)
03594
84.8%
1643
 
15.2%

Length

2021-07-19T20:44:07.402265image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-19T20:44:07.488525image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
03594
84.8%
1643
 
15.2%

Most occurring characters

ValueCountFrequency (%)
03594
84.8%
1643
 
15.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4237
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
03594
84.8%
1643
 
15.2%

Most occurring scripts

ValueCountFrequency (%)
Common4237
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
03594
84.8%
1643
 
15.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII4237
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
03594
84.8%
1643
 
15.2%

Interactions

2021-07-19T20:43:56.303968image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:43:56.472970image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:43:56.591231image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:43:56.696391image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:43:56.792382image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:43:56.885290image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:43:56.976904image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:43:57.076300image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:43:57.282385image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:43:57.390007image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:43:57.490167image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:43:57.588911image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:43:57.688355image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:43:57.785039image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:43:57.881487image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:43:57.976837image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:43:58.076187image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:43:58.175990image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:43:58.274875image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:43:58.368940image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:43:58.460781image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:43:58.553090image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:43:58.646243image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:43:58.741243image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:43:58.837074image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:43:58.929629image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:43:59.023107image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:43:59.116143image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:43:59.207937image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:43:59.297293image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:43:59.492806image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:43:59.591409image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:43:59.688850image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:43:59.781710image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:43:59.875197image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:43:59.965476image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:44:00.056013image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:44:00.147120image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:44:00.233367image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:44:00.324358image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:44:00.417947image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:44:00.508572image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:44:00.600048image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:44:00.691180image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:44:00.777030image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:44:00.862455image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:44:00.943876image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:44:01.029270image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:44:01.120730image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:44:01.224485image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:44:01.321594image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:44:01.418056image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:44:01.509740image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:44:01.601101image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:44:01.695392image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:44:01.788038image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:44:01.883417image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:44:01.981625image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:44:02.224684image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:44:02.339452image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:44:02.433162image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:44:02.524577image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:44:02.619888image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-19T20:44:02.723863image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Correlations

2021-07-19T20:44:07.608862image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-07-19T20:44:07.830951image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-07-19T20:44:08.036711image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-07-19T20:44:08.260256image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-07-19T20:44:08.447971image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-07-19T20:44:02.955542image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-07-19T20:44:03.141978image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

SexageeducationcurrentSmokercigsPerDayBPMedsprevalentStrokeprevalentHypdiabetestotCholsysBPdiaBPBMIheartRateglucoseTenYearCHD
01394.000.00.0000195.0106.070.026.9780.077.00
10462.000.00.0000250.0121.081.028.7395.076.00
21481.0120.00.0000245.0127.580.025.3475.070.00
30613.0130.00.0010225.0150.095.028.5865.0103.01
40463.0123.00.0000285.0130.084.023.1085.085.00
50432.000.00.0010228.0180.0110.030.3077.099.00
60631.000.00.0000205.0138.071.033.1160.085.01
70452.0120.00.0000313.0100.071.021.6879.078.00
81521.000.00.0010260.0141.589.026.3676.079.00
91431.0130.00.0010225.0162.0107.023.6193.088.00

Last rows

SexageeducationcurrentSmokercigsPerDayBPMedsprevalentStrokeprevalentHypdiabetestotCholsysBPdiaBPBMIheartRateglucoseTenYearCHD
42270501.000.00.0011260.0190.0130.043.6785.0260.0000000
42280513.0120.00.0010251.0140.080.025.6075.085.7368420
42290561.013.00.0010268.0170.0102.022.8957.082.0526320
42301583.000.00.0010187.0141.081.024.9680.081.0000000
42311681.000.00.0010176.0168.097.023.1460.079.0000001
42321501.011.00.0010313.0179.092.025.9766.086.0000001
42331513.0143.00.0000207.0126.580.019.7165.068.0000000
42340482.0120.00.0000248.0131.072.022.0084.086.0000000
42350441.0115.00.0000210.0126.587.019.1686.079.0526320
42360522.000.00.0000269.0133.583.021.4780.0107.0000000